Duration modeling using cumulative duration probability and speaking rate compensation

نویسندگان

Tae-Young Yang

Ji-Sung Kim

Chungyong Lee

Dae Hee Youn

Il-Whan Cha

چکیده

A duration modeling scheme and a speaking rate compensation technique are presented for the HMM based connected digit recognizer. The proposed duration modeling technique uses a cumulative duration probability. The cumulative duration probability also can be used to obtain the duration bounds for the bounded duration modeling. One of the advantages of proposed technique is that the cumulative duration probability can be applied directly to the Viterbi decoding procedure without additional postprocessing. Therefore, it rules the state and word transition at each frame. To alleviate the problems due to fast or slow speech, a modification to the bounded duration modeling which accounts for speaking rate is described. The experimental results on Korean connected digit recognition show the effectiveness of the proposed duration modeling scheme and the speaking rate compensation technique.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners

In this paper we evaluate a method for generating synthetic speech at high speaking rates based on the interpolation of hidden semi-Markov models (HSMMs) trained on speech data recorded at normal and fast speaking rates. The subjective evaluation was carried out with both blind listeners, who are used to very fast speaking rates, and sighted listeners. We show that we can achieve a better intel...

متن کامل

Duration modeling for HMM-based speech synthesis

This paper proposes a new approach to state duration modeling for HMM-based speech synthesis. A set of state durations of each phoneme HMM is modeled by a multi-dimensional Gaussian distribution, and duration models are clustered using a decision tree based context clustering technique. In the synthesis stage, state durations are determined by using the state duration models. In this paper, we ...

متن کامل

A Study of Tones and Tempo in Continuous Mandarin Digit Strings and Their Application in Telephone Quality Speech Recognition1

Prosodic cues (namely, fundamental frequency, energy and duration) provide important information for speech. For a tonal language such as Chinese, fundamental frequency (F0) plays a critical role in characterizing tone as well, which is an essential phonemic feature. In this paper, we describe our work on duration and tone modeling for telephone-quality continuous Mandarin digits, and the appli...

متن کامل

Speaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices

Voice access of cloud applications including social networks using mobile devices becomes attractive today. And personalized speech recognizers over mobile devices become feasible because most mobile devices have only a single user. Speaking rate variation is known to be an important source of performance degradation for spontaneous speech recognition. Speaking rate is speaker dependent, it cha...

متن کامل

Determinants of Unemployment Duration in Iran

This study investigates the effect of individual characteristics on the unemployment duration of job seekers in Iran, using information from the Labor Force Survey in 2018 and applying of nonparametric, semi-parametric and parametric methods. The results show that the probability of employment for married males and also Married females are more likely than single people. Regional differences in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Duration modeling using cumulative duration probability and speaking rate compensation

نویسندگان

چکیده

منابع مشابه

Synthesis of fast speech with interpolation of adapted HSMMs and its evaluation by blind and sighted listeners

Duration modeling for HMM-based speech synthesis

A Study of Tones and Tempo in Continuous Mandarin Digit Strings and Their Application in Telephone Quality Speech Recognition1

Speaking rate normalization with lattice-based context-dependent phoneme duration modeling for personalized speech recognizers on mobile devices

Determinants of Unemployment Duration in Iran

عنوان ژورنال:

اشتراک گذاری